Especialização em Inteligência Artificial - IFMG-OP
¶

Análise e Previsão de Séries Temporais
¶

Prof. Carlos Severiano - Aluno: Fernando dos Santos Alves Fernandes
¶


Previsão da Evapotranspiração de Referência (ETo)
¶

Introdução¶

Evapotranspiração é o processo pelo qual uma lavoura "perde" água para atmosfera. Consiste na ocorrência dos processos de evaporação e transpiração simultaneamente. Enquanto a evaporação é a passagem da água do estado líquido para o estado gasoso (vapor) a partir de qualquer superfície (lagos, rios, pavimentos, solo e área de vegetação), a transpiração é o processo de "perda" de água que ocorre a partir das folhas, flores, caules e raízes de uma vegetação.
Os principais fatores que afetam o processo de evapotranspiração são:

  • parâmetros climáticos (radiação, temperatura do ar, umidade relativa do ar, velocidade do vento);
  • fatores relacionados ao tipo de cultivo (variedade da planta, estágio de desenvolvimento, cobertura do solo, características das raízes);
  • condições ambientais e de manejo do solo (salinidade do solo, fertilidade do solo, limitações no uso de fertilizantes, ausência de controle de doenças);

Fig. 1 - Evapotranspiração, de acordo com os fatores que afetam seu cálculo.

Fatores que afetam as ETos

Fig. 2 - Relação entre as Evapotranspirações.

Relação entre evapotranspirações

Fig. 3 - Equação FAO* Penman-Monteith para cálculo da ETo.

Equação FAO Penman-Monteith

*Food and Agriculture Organization of The United Nations (No Brasil, desde 1949.)

onde:

  • ETo - reference evapotranspiration [mm.day-1]
  • Rn - net radiation at the crop surface [MJ.m-2.day-1]
  • G - soil heat flux density [MJ.m-2/day]
  • T - mean daily air temperatura at 2 m height [C]
  • u2 - wind speed at 2 m height [m.s-1]
  • es - saturation vapour pressure [kPa]
  • ea - actual vapour pressure [kPa]
  • (es - ea) - saturation vapour pressure deficit [kPa]
  • delta - slope vapour pressure curve [kPa.C-1]
  • gama - psychrometric constant [kPa.C-1]

Vantagens:

  • Referência para comparação da ET para diferentes períodos do ano e para diferentes regiões e diferentes culturas;
  • O cálculo só depende de dados meteorológicos;
  • Históricos permitem realizar previsões;
  • Auxilia no planejamento e manejo da irrigação;

Desvantagens:

  • Dependente de dados meteorológicos;

Ref.:

ALLEN, Richard G.; PEREIRA, Luis S.; RAES, Dirk; SMITH, Martin. Crop evapotranspiration - Guideline for computing crop water requirements. FAO Irrigation and Drainage Paper 56


Bibliotecas necessárias:¶

In [ ]:
import os
import numpy as np
from numpy import array
from numpy import concatenate

import pandas as pd
from pandas import concat

import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 7

from math import sqrt

from datetime import datetime

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_absolute_error

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

import statsmodels.tsa.api as smt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
from tqdm import tqdm_notebook
from typing import Union
from itertools import product

import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.arima import auto_arima

import seaborn as sns

from scipy.stats import chi2

# import geobr
# from geopy.geocoders import Nominatim

import warnings
warnings.filterwarnings('ignore')

Definição de funções auxiliares:¶

In [ ]:
def teste_estacionariedade(X, cutoff=0.01):
    # O valor de cutoff, aqui definido como 0.01, serve como ponto de corte que, abaixo dele, sugere estacionariedade
    pvalue = adfuller(X)[1]
    if pvalue < cutoff:
        print('valor-p = ' + str(pvalue) + '. A série ' + X.name +' parece ser estacionária.')
        return True
    else:
        print('valor-p = ' + str(pvalue) + '. A série ' + X.name +' parece ser não-estacionária')
        return False
In [ ]:
def split_sequence(sequence, n_lags):

    X, y = list(), list()
    for i in range(len(sequence)):
        end_ix = i + n_lags
        # Testa se a sequência chegou ao fim
        if end_ix > len(sequence)-1:
            break
        # cria os trechos x e y de cada amostra
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)
In [ ]:
def find_coordinates(file_name):
    lat = ''
    lon = ''
    previous_coord = file_name.split(sep='_')
    for item in previous_coord:
        if item.startswith('lat'):
            lat = float(item.split(sep='lat')[1])
        else:
            lon = float(item.split(sep='lon')[1])
    print(f'Coordinates of {file_name}: ({lat}, {lon})')
    location = geolocator.geocode(str(lat)+","+str(lon))
    print(f'\tLocation: {location}')
In [ ]:
def return_coordinates(file_name, lat, lon):
    # lat = ''
    # lon = ''
    previous_coord = file_name.split(sep='_')
    i = 0
    for item in previous_coord:
        if item.startswith('lat'):
            lat = float(item.split(sep='lat')[1])
        else:
            lon = float(item.split(sep='lon')[1])
    location = geolocator.geocode(str(lat)+","+str(lon))            
    # print(f'{lat}, {lon}')
    return [lat, lon]
In [ ]:
def normalize(df):
    mindf = df.min()
    maxdf = df.max()
    return (df-mindf)/(maxdf-mindf)

def denormalize(norm, _min, _max):
    return [(n * (_max-_min)) + _min for n in norm]

Escolha de uma das 152 bases de ETo disponíveis:¶

  • Alexandre C. Xavier, Carey W. King, Bridget R. Scanlon. Daily gridded meteorological variables in Brazil (1980–2013)
  • Alexandre C. Xavier, Bridget R. Scanlon, Carey W. King, Aline I. Alves. New improved Brazilian daily weather gridded data (1961–2020)
In [ ]:
origin = "datasets/only-clima/"
geolocator = Nominatim(user_agent='geoapiFSAF')
In [ ]:
file_names = []

for path, subdirectory, files in os.walk(origin):
    for name in files:
        file_names.append(name.split(sep='.csv')[0])

len(file_names)
Out[ ]:
152
In [ ]:
latitudes = []
longitudes = []

for fn in file_names:
    # find_coordinates(fn)
    # light_find_coordinates(fn)
    lat, lon = return_coordinates(fn,latitudes, longitudes)
    latitudes.append(lat)
    longitudes.append(lon)
In [ ]:
plt.figure(figsize=(10, 8))
plt.scatter(longitudes, latitudes, marker='.')
Out[ ]:
<matplotlib.collections.PathCollection at 0x1c80e11e110>
No description has been provided for this image

A base escolhida foi a de coordenadas geográficas com latitude -19.75 e longitude -44.45, correspondente à localidade de Pará de Minas/MG, ponto mais próximo da estação meteorológica de Sete Lagoas/MG (lat. -19.46, lon. -44.25), objeto de estudo nos trabalhos listados a seguir:

  • LUCAS, Patrícia de Oliveira e. Previsão de Séries Temporais de Evapotranspiração de Referência com Redes Neurais Convolucionais
  • Patrícia de Oliveira e Lucas, Marcos Antonio Alves, Petrônio Cândido de Lima e Silva, Frederico Gadelha Guimarães. Reference evapotranspiration time series forecasting with ensemble of convolutional neural networks
  • Patrícia de Oliveira e Lucas et al. REFERENCE EVAPOTRANSPIRATION PREDICTION FOR PRECISION AGRICULTURE USING FUZZY TIME SERIES.

Importando a base de dados escolhida.

In [ ]:
df_target = pd.read_csv('datasets/clima/lat-19.75_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df_target = df_target.round(decimals=1)
df_target.fillna(0, inplace=True)
df_target.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 10.5 1.0 23.2 18.9 93.7 22.2 2.2
2000-01-02 11.1 1.4 25.2 19.4 90.1 24.8 2.4
2000-01-03 11.5 1.6 26.4 19.0 86.4 30.2 2.6
2000-01-04 12.3 1.3 27.6 19.0 84.2 34.7 2.8
2000-01-05 21.9 1.0 31.0 19.4 73.8 9.9 4.7
In [ ]:
df_target.tail()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2019-12-27 25.5 2.0 31.0 18.3 65.7 0.0 5.8
2019-12-28 22.6 1.6 31.0 18.0 67.5 0.0 5.1
2019-12-29 23.1 1.4 31.4 18.1 70.9 0.0 5.1
2019-12-30 25.1 1.7 31.1 17.2 64.9 0.0 5.6
2019-12-31 26.8 1.9 32.1 16.7 62.6 0.0 6.1

Plotando a série de ETo.

In [ ]:
df_target['ETo'].plot(title='Dados de Pará de Minas/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image

Confirmando a presença de sazonalidade.

In [ ]:
from pandas.plotting import autocorrelation_plot

autocorrelation_plot(df_target['ETo'])
plt.show()
No description has been provided for this image

Analisando os componentes das série.

In [ ]:
# Análise preliminar dos componentes da série: tendência, sazonalidade e resíduos.

decomposicao = seasonal_decompose(df_target['ETo'], period=365)

tendencia = decomposicao.trend
sazonalidade = decomposicao.seasonal
residuo = decomposicao.resid

plt.subplot(411)
plt.plot(df_target['ETo'], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(tendencia, label='Tendência')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(sazonalidade,label='Sazonalidade')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residuo, label='Resíduos')
plt.legend(loc='best')
plt.tight_layout()
plt.show()
No description has been provided for this image

Confirmando o período de sazonalidade: semestral.

In [ ]:
decomposition=seasonal_decompose(df_target['ETo'], model='additive', period=183)
decomposition.plot()
plt.show()
No description has been provided for this image

Verificando a estacionariedade da série.

In [ ]:
teste_estacionariedade(df_target['ETo'])
valor-p = 2.797460563880368e-15. A série ETo parece ser estacionária.
Out[ ]:
True
In [ ]:
dftest = adfuller(df_target['ETo'], autolag = 'AIC')
print("1. ADF : ", dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
    print("\t",key, ": ", val)
1. ADF :  -9.144138505074833
2. P-Value :  2.797460563880368e-15
3. Num Of Lags :  19
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 7285
5. Critical Values :
	 1% :  -3.4312479554815933
	 5% :  -2.861936826623019
	 10% :  -2.5669812265733456
In [ ]:
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, sharex=True)

smt.graphics.plot_acf(df_target['ETo'], ax=ax1, lags=20, alpha=0.05, title="Autocorrelação da Série de ETo")
smt.graphics.plot_pacf(df_target['ETo'], ax=ax2, lags=20,  alpha=0.05, title="Autocorrelação Parcial da Série de ETo")
plt.show()
No description has been provided for this image

Pelos correlogramas, a série parece representar um processo autogressivo de ordem 1, AR(1). No entanto, vamos testar usar pmdarima.arima.auto_arima para tentar obter os melhores hiperparâmetros para o modelo.

In [ ]:
# Rodando o auto_arima para os primeiros 365 dias da série (2000).

stepwise = auto_arima(df_target['ETo'][:365],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1088.071, Time=0.14 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=22.88 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=inf, Time=21.64 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2022.136, Time=0.01 sec
 ARIMA(0,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=18.71 sec
 ARIMA(0,0,0)(0,0,1)[183] intercept   : AIC=inf, Time=31.81 sec
 ARIMA(0,0,0)(1,0,1)[183] intercept   : AIC=inf, Time=36.61 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=752.189, Time=0.04 sec
 ARIMA(1,0,0)(0,0,1)[183] intercept   : AIC=751.513, Time=21.12 sec
 ARIMA(1,0,0)(1,0,1)[183] intercept   : AIC=753.552, Time=28.05 sec
 ARIMA(1,0,1)(0,0,1)[183] intercept   : AIC=753.262, Time=25.35 sec
 ARIMA(1,0,0)(0,0,1)[183]             : AIC=792.227, Time=20.93 sec

Best model:  ARIMA(1,0,0)(0,0,1)[183] intercept
Total fit time: 231.891 seconds
In [ ]:
# Rodando o auto_arima para o segundo ano da série (2001).

stepwise = auto_arima(df_target['ETo'][365:730],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1076.032, Time=0.12 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=26.16 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=894.816, Time=23.54 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2044.827, Time=0.01 sec
 ARIMA(0,0,1)(0,0,0)[183] intercept   : AIC=894.833, Time=0.09 sec
 ARIMA(0,0,1)(1,0,1)[183] intercept   : AIC=898.971, Time=28.87 sec
 ARIMA(0,0,1)(1,0,0)[183] intercept   : AIC=1843.685, Time=23.25 sec
 ARIMA(0,0,0)(0,0,1)[183] intercept   : AIC=inf, Time=22.35 sec
 ARIMA(1,0,1)(0,0,1)[183] intercept   : AIC=793.837, Time=32.09 sec
 ARIMA(1,0,1)(0,0,0)[183] intercept   : AIC=791.840, Time=0.05 sec
 ARIMA(1,0,1)(1,0,0)[183] intercept   : AIC=1152.008, Time=28.00 sec
 ARIMA(1,0,1)(1,0,1)[183] intercept   : AIC=795.838, Time=33.11 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=789.956, Time=0.08 sec
 ARIMA(1,0,0)(0,0,1)[183] intercept   : AIC=791.953, Time=41.40 sec
 ARIMA(1,0,0)(1,0,1)[183] intercept   : AIC=793.953, Time=61.31 sec
 ARIMA(1,0,0)(0,0,0)[183]             : AIC=836.858, Time=4.75 sec

Best model:  ARIMA(1,0,0)(0,0,0)[183] intercept
Total fit time: 329.653 seconds
In [ ]:
# Rodando o auto_arima para o segundo ano da série (2004).

stepwise = auto_arima(df_target['ETo'][1460:1825],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1129.985, Time=0.03 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=758.524, Time=55.25 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=900.992, Time=48.65 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2020.728, Time=0.03 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=756.542, Time=0.04 sec
 ARIMA(1,0,0)(0,0,1)[183] intercept   : AIC=758.523, Time=26.56 sec
 ARIMA(1,0,0)(1,0,1)[183] intercept   : AIC=760.518, Time=28.04 sec
 ARIMA(1,0,1)(0,0,0)[183] intercept   : AIC=757.391, Time=0.05 sec
 ARIMA(0,0,1)(0,0,0)[183] intercept   : AIC=903.256, Time=0.04 sec
 ARIMA(1,0,0)(0,0,0)[183]             : AIC=792.457, Time=0.03 sec

Best model:  ARIMA(1,0,0)(0,0,0)[183] intercept
Total fit time: 158.974 seconds
In [ ]:
# Rodando o auto_arima para o segundo ano da série (2006).

stepwise = auto_arima(df_target['ETo'][2190:2555],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1082.285, Time=0.03 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=26.11 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=887.915, Time=21.99 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2010.069, Time=0.01 sec
 ARIMA(0,0,1)(0,0,0)[183] intercept   : AIC=890.569, Time=0.04 sec
 ARIMA(0,0,1)(1,0,1)[183] intercept   : AIC=891.581, Time=29.70 sec
 ARIMA(0,0,1)(1,0,0)[183] intercept   : AIC=1951.420, Time=21.18 sec
 ARIMA(0,0,0)(0,0,1)[183] intercept   : AIC=inf, Time=19.54 sec
 ARIMA(1,0,1)(0,0,1)[183] intercept   : AIC=757.277, Time=29.23 sec
 ARIMA(1,0,1)(0,0,0)[183] intercept   : AIC=755.240, Time=0.06 sec
 ARIMA(1,0,1)(1,0,0)[183] intercept   : AIC=1065.477, Time=33.99 sec
 ARIMA(1,0,1)(1,0,1)[183] intercept   : AIC=759.287, Time=35.75 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=754.425, Time=0.04 sec
 ARIMA(1,0,0)(0,0,1)[183] intercept   : AIC=756.387, Time=26.15 sec
 ARIMA(1,0,0)(1,0,1)[183] intercept   : AIC=758.390, Time=43.41 sec
 ARIMA(1,0,0)(0,0,0)[183]             : AIC=796.675, Time=0.06 sec

Best model:  ARIMA(1,0,0)(0,0,0)[183] intercept
Total fit time: 288.081 seconds
In [ ]:
# Rodando o auto_arima para o segundo ano da série (2018).

stepwise = auto_arima(df_target['ETo'][6570:6935],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1140.097, Time=0.08 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=26.66 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=982.874, Time=23.38 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2048.352, Time=0.04 sec
 ARIMA(0,0,1)(0,0,0)[183] intercept   : AIC=982.628, Time=0.04 sec
 ARIMA(0,0,1)(1,0,0)[183] intercept   : AIC=3744.270, Time=21.50 sec
 ARIMA(0,0,1)(1,0,1)[183] intercept   : AIC=985.317, Time=31.96 sec
 ARIMA(1,0,1)(0,0,0)[183] intercept   : AIC=867.874, Time=0.07 sec
 ARIMA(1,0,1)(1,0,0)[183] intercept   : AIC=1405.738, Time=28.00 sec
 ARIMA(1,0,1)(0,0,1)[183] intercept   : AIC=869.622, Time=26.59 sec
 ARIMA(1,0,1)(1,0,1)[183] intercept   : AIC=871.700, Time=33.25 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=870.634, Time=0.04 sec
 ARIMA(1,0,1)(0,0,0)[183]             : AIC=897.641, Time=0.03 sec

Best model:  ARIMA(1,0,1)(0,0,0)[183] intercept
Total fit time: 192.134 seconds
In [ ]:
# Rodando o auto_arima para o segundo ano da série (2019).

stepwise = auto_arima(df_target['ETo'][6940:7305],
                      start_p=0,
                      start_q=0,
                      d=0,
                      max_p=1,
                      max_q=1,
                      start_P=0,
                      start_Q=0,
                      D=0,
                      max_P=1, max_Q=1, max_D=1, max_order=1,
                      m=183,
                      seasonal=True,
                      trace=True,
                      error_action='ignore',
                      suppress_warnings=True,
                      stepwise=True, maxiter=3)
Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[183] intercept   : AIC=1203.577, Time=0.19 sec
 ARIMA(1,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=21.01 sec
 ARIMA(0,0,1)(0,0,1)[183] intercept   : AIC=inf, Time=22.72 sec
 ARIMA(0,0,0)(0,0,0)[183]             : AIC=2099.244, Time=0.02 sec
 ARIMA(0,0,0)(1,0,0)[183] intercept   : AIC=inf, Time=17.07 sec
 ARIMA(0,0,0)(0,0,1)[183] intercept   : AIC=inf, Time=18.30 sec
 ARIMA(0,0,0)(1,0,1)[183] intercept   : AIC=inf, Time=25.63 sec
 ARIMA(1,0,0)(0,0,0)[183] intercept   : AIC=847.188, Time=0.07 sec
 ARIMA(1,0,0)(0,0,1)[183] intercept   : AIC=849.187, Time=24.97 sec
 ARIMA(1,0,0)(1,0,1)[183] intercept   : AIC=851.187, Time=32.13 sec
 ARIMA(1,0,1)(0,0,0)[183] intercept   : AIC=839.578, Time=0.05 sec
 ARIMA(1,0,1)(1,0,0)[183] intercept   : AIC=1356.271, Time=38.88 sec
 ARIMA(1,0,1)(0,0,1)[183] intercept   : AIC=841.572, Time=27.10 sec
 ARIMA(1,0,1)(1,0,1)[183] intercept   : AIC=843.610, Time=47.15 sec
 ARIMA(0,0,1)(0,0,0)[183] intercept   : AIC=1008.495, Time=0.81 sec
 ARIMA(1,0,1)(0,0,0)[183]             : AIC=856.753, Time=0.35 sec

Best model:  ARIMA(1,0,1)(0,0,0)[183] intercept
Total fit time: 277.221 seconds
In [ ]:
# Normalize Data

# Save Min-Max for Denorm
min_raw = df_target.min()

max_raw = df_target.max()

# Perform Normalization
norm_df = normalize(df_target)
df = df_target
In [ ]:
# Modelo ARIMA.

# model = ARIMA(df_target['ETo'], order=(1,0,0))
model = ARIMA(norm_df['ETo'], order=(1,0,0))
model_fit = model.fit()
print(model_fit.summary())

# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.

# X = df_target['ETo'].values
X = norm_df['ETo'].values
size = int(len(X) * 0.80)
train, test = X[0:size], X[size:len(X)]
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                    ETo   No. Observations:                 7305
Model:                 ARIMA(1, 0, 0)   Log Likelihood                5157.796
Date:                Mon, 11 Dec 2023   AIC                         -10309.592
Time:                        22:11:59   BIC                         -10288.903
Sample:                    01-01-2000   HQIC                        -10302.478
                         - 12-31-2019                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.4826      0.007     72.618      0.000       0.470       0.496
ar.L1          0.7889      0.008     93.116      0.000       0.772       0.806
sigma2         0.0143      0.000     70.864      0.000       0.014       0.015
===================================================================================
Ljung-Box (L1) (Q):                   5.39   Jarque-Bera (JB):               970.04
Prob(Q):                              0.02   Prob(JB):                         0.00
Heteroskedasticity (H):               1.27   Skew:                            -0.14
Prob(H) (two-sided):                  0.00   Kurtosis:                         4.76
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
In [ ]:
# Aplicando a previsão aos dados, usando rolling forecasting.

history = [x for x in train]
predictions = []

n_prints = 3

for t in range(len(test)):
	model = ARIMA(history, order=(1,0,0))
	model_fit = model.fit()
	output = model_fit.forecast()
	yhat = output[0]
	predictions.append(yhat)
	obs = test[t]
	history.append(obs)
	if t < 3 or t >= (len(test) - 3):
		print('previsão=%f, osbervado=%f' % (yhat, obs))		
previsão=0.680182, osbervado=0.716667
previsão=0.667001, osbervado=0.433333
...
previsão=0.442556, osbervado=0.316667
previsão=0.640897, osbervado=0.766667
previsão=0.706656, osbervado=0.850000
In [ ]:
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE

mae_ar = mean_absolute_error(test, predictions)
print('Test MAE: %.3f' % mae_ar)

rmse_ar = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse_ar)

mape_ar = mean_absolute_percentage_error(test, predictions)
print('Test MAPE: %.3f' % mape_ar)

plt.title("ETo de Pará de Minas/MG")
plt.plot(test, label='Test')
plt.plot(predictions, color='red', label='Prediction')
plt.legend(loc='best')
plt.show()
Test MAE: 0.091
Test RMSE: 0.126
Test MAPE: 0.259
No description has been provided for this image
In [ ]:
# Ilustrando graficamente o desempenho do modelo ARIMA.

fig, ax = plt.subplots()

ax.plot(df_target['ETo'].index[0:size], train, 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(X)], test, 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(X)], predictions, 'r--', label='Predicted')

ax.legend(loc='best')

ax.set_xlabel('Year')
ax.set_ylabel('Value')

# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza

plt.title("ETo de Pará de Minas/MG")

fig.autofmt_xdate()
plt.tight_layout()
No description has been provided for this image

Aqui começam as previsões multivariadas.¶

In [ ]:
df_target.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 10.5 1.0 23.2 18.9 93.7 22.2 2.2
2000-01-02 11.1 1.4 25.2 19.4 90.1 24.8 2.4
2000-01-03 11.5 1.6 26.4 19.0 86.4 30.2 2.6
2000-01-04 12.3 1.3 27.6 19.0 84.2 34.7 2.8
2000-01-05 21.9 1.0 31.0 19.4 73.8 9.9 4.7

Variáveis exógenas (preditores, variáveis de entrada):

  • Rs - Solar radiation
  • u2 - Wind speed
  • Tmax - Maximum temperature
  • Tmin - Minimum temperature
  • RH - Relative humidity
  • pr - Precipitation

Variável endógena (variável alvo, variável de interesse, variável de saída):

  • ETo - Reference Evapotranspiration
In [ ]:
df_target.plot(title='Dados de Pará de Minas/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
values = df_target.values
# specify columns to plot
groups = [0, 1, 2, 3, 4, 5, 6]
i = 1
# plot each column
plt.figure()
for group in groups:
	plt.subplot(len(groups), 1, i)
	plt.plot(values[:, group])
	plt.title(df_target.columns[group], y=0.5, loc='right')
	i += 1
plt.show()
No description has been provided for this image

Separando os dados da variável alvo, das variáveis exógenas.

In [ ]:
target = df_target[['ETo']]
target.name = 'ETo'
target.head()
Out[ ]:
ETo
2000-01-01 2.2
2000-01-02 2.4
2000-01-03 2.6
2000-01-04 2.8
2000-01-05 4.7
In [ ]:
exog = df_target[['Rs', 'u2', 'Tmax', 'Tmin', 'RH', 'pr']]
exog.name = 'Exogenous Variables'
exog.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr
2000-01-01 10.5 1.0 23.2 18.9 93.7 22.2
2000-01-02 11.1 1.4 25.2 19.4 90.1 24.8
2000-01-03 11.5 1.6 26.4 19.0 86.4 30.2
2000-01-04 12.3 1.3 27.6 19.0 84.2 34.7
2000-01-05 21.9 1.0 31.0 19.4 73.8 9.9

Testando a estacionariedade da série alvo.

In [ ]:
teste_estacionariedade(target)
valor-p = 2.7974605638801155e-15. A série ETo parece ser estacionária.
Out[ ]:
True
In [ ]:
from statsmodels.tsa.api import VAR
In [ ]:
def normalize(df):
    mindf = df.min()
    maxdf = df.max()
    return (df-mindf)/(maxdf-mindf)

def denormalize(norm, _min, _max):
    return [(n * (_max-_min)) + _min for n in norm]
In [ ]:
def df_derived_by_shift(df,lag=0,NON_DER=[]):
    df = df.copy()
    if not lag:
        return df
    cols ={}
    for i in range(1,lag+1):
        for x in list(df.columns):
            if x not in NON_DER:
                if not x in cols:
                    cols[x] = ['{}_{}'.format(x, i)]
                else:
                    cols[x].append('{}_{}'.format(x, i))
    for k,v in cols.items():
        columns = v
        dfn = pd.DataFrame(data=None, columns=columns, index=df.index)    
        i = 1
        for c in columns:
            dfn[c] = df[k].shift(periods=i)
            i+=1
        df = pd.concat([df, dfn], axis=1)#, join_axes=[df.index])
    return df
In [ ]:
# Normalize Data

# Save Min-Max for Denorm
min_raw = df_target.min()

max_raw = df_target.max()

# Perform Normalization
norm_df = normalize(df_target)
df = df_target
In [ ]:
norm_df.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 0.243542 0.25000 0.268868 0.816038 0.986175 0.205937 0.200000
2000-01-02 0.265683 0.37500 0.363208 0.839623 0.930876 0.230056 0.233333
2000-01-03 0.280443 0.43750 0.419811 0.820755 0.874040 0.280148 0.266667
2000-01-04 0.309963 0.34375 0.476415 0.820755 0.840246 0.321892 0.300000
2000-01-05 0.664207 0.25000 0.636792 0.839623 0.680492 0.091837 0.616667
In [ ]:
# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.

# X = df_target['ETo'].values
size = int(len(norm_df) * 0.80)
train, test = norm_df[0:size], norm_df[size:len(norm_df)]

# Modelo VAR.

order = 3
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
lag_order = model_fit.k_ar

print(f'lag_order = {lag_order}\n')
print(model_fit.summary())
lag_order = 3

  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Mon, 11, Dec, 2023
Time:                     22:30:55
--------------------------------------------------------------------
No. of Equations:         7.00000    BIC:                   -38.8687
Nobs:                     5841.00    HQIC:                  -38.9834
Log likelihood:           56167.7    FPE:                1.10442e-17
AIC:                     -39.0446    Det(Omega_mle):     1.07574e-17
--------------------------------------------------------------------
Results for equation y1
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.337793         0.029471           11.462           0.000
L1.y1         0.153082         0.067637            2.263           0.024
L1.y2        -0.140028         0.030479           -4.594           0.000
L1.y3        -0.288964         0.037112           -7.786           0.000
L1.y4        -0.124168         0.035764           -3.472           0.001
L1.y5        -0.076731         0.033016           -2.324           0.020
L1.y6         0.028181         0.026531            1.062           0.288
L1.y7         0.597346         0.093981            6.356           0.000
L2.y1        -0.010511         0.073283           -0.143           0.886
L2.y2         0.033535         0.034195            0.981           0.327
L2.y3         0.032007         0.041163            0.778           0.437
L2.y4         0.015724         0.039733            0.396           0.692
L2.y5         0.045582         0.039146            1.164           0.244
L2.y6         0.081578         0.026663            3.060           0.002
L2.y7         0.047763         0.104510            0.457           0.648
L3.y1        -0.025231         0.066502           -0.379           0.704
L3.y2        -0.039924         0.031466           -1.269           0.205
L3.y3        -0.007293         0.033934           -0.215           0.830
L3.y4         0.023225         0.033517            0.693           0.488
L3.y5         0.076131         0.034186            2.227           0.026
L3.y6        -0.033234         0.026103           -1.273           0.203
L3.y7         0.107963         0.094382            1.144           0.253
========================================================================

Results for equation y2
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.280645         0.021634           12.972           0.000
L1.y1        -0.140443         0.049651           -2.829           0.005
L1.y2         0.469955         0.022374           21.005           0.000
L1.y3        -0.064513         0.027243           -2.368           0.018
L1.y4         0.034738         0.026253            1.323           0.186
L1.y5        -0.153016         0.024236           -6.313           0.000
L1.y6         0.003421         0.019476            0.176           0.861
L1.y7         0.155128         0.068990            2.249           0.025
L2.y1        -0.136538         0.053796           -2.538           0.011
L2.y2        -0.056137         0.025102           -2.236           0.025
L2.y3         0.052977         0.030217            1.753           0.080
L2.y4        -0.081352         0.029167           -2.789           0.005
L2.y5         0.059432         0.028736            2.068           0.039
L2.y6         0.004443         0.019573            0.227           0.820
L2.y7         0.123180         0.076719            1.606           0.108
L3.y1        -0.064224         0.048818           -1.316           0.188
L3.y2         0.080862         0.023099            3.501           0.000
L3.y3         0.004188         0.024910            0.168           0.866
L3.y4        -0.002234         0.024604           -0.091           0.928
L3.y5        -0.006005         0.025095           -0.239           0.811
L3.y6         0.019104         0.019162            0.997           0.319
L3.y7         0.052985         0.069284            0.765           0.444
========================================================================

Results for equation y3
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.215614         0.015183           14.201           0.000
L1.y1         0.035070         0.034846            1.006           0.314
L1.y2        -0.262577         0.015702          -16.722           0.000
L1.y3         0.440181         0.019120           23.023           0.000
L1.y4         0.105685         0.018425            5.736           0.000
L1.y5        -0.108400         0.017010           -6.373           0.000
L1.y6        -0.000875         0.013668           -0.064           0.949
L1.y7         0.304156         0.048418            6.282           0.000
L2.y1         0.067322         0.037755            1.783           0.075
L2.y2         0.131608         0.017617            7.471           0.000
L2.y3         0.044007         0.021207            2.075           0.038
L2.y4        -0.014950         0.020470           -0.730           0.465
L2.y5        -0.001321         0.020168           -0.066           0.948
L2.y6         0.033876         0.013736            2.466           0.014
L2.y7        -0.199418         0.053843           -3.704           0.000
L3.y1        -0.094470         0.034261           -2.757           0.006
L3.y2        -0.001927         0.016211           -0.119           0.905
L3.y3         0.043795         0.017482            2.505           0.012
L3.y4        -0.021245         0.017268           -1.230           0.219
L3.y5         0.019848         0.017612            1.127           0.260
L3.y6         0.003339         0.013448            0.248           0.804
L3.y7         0.095805         0.048625            1.970           0.049
========================================================================

Results for equation y4
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.090009         0.013845            6.501           0.000
L1.y1        -0.298909         0.031774           -9.407           0.000
L1.y2        -0.203510         0.014318          -14.214           0.000
L1.y3         0.195550         0.017434           11.217           0.000
L1.y4         0.602033         0.016801           35.834           0.000
L1.y5         0.128608         0.015510            8.292           0.000
L1.y6         0.037764         0.012463            3.030           0.002
L1.y7         0.400819         0.044149            9.079           0.000
L2.y1         0.192060         0.034426            5.579           0.000
L2.y2         0.092546         0.016064            5.761           0.000
L2.y3        -0.074405         0.019337           -3.848           0.000
L2.y4         0.107889         0.018665            5.780           0.000
L2.y5        -0.076377         0.018389           -4.153           0.000
L2.y6         0.023572         0.012525            1.882           0.060
L2.y7        -0.279247         0.049095           -5.688           0.000
L3.y1        -0.081435         0.031240           -2.607           0.009
L3.y2        -0.048108         0.014782           -3.255           0.001
L3.y3        -0.099461         0.015941           -6.239           0.000
L3.y4         0.083390         0.015745            5.296           0.000
L3.y5         0.003746         0.016059            0.233           0.816
L3.y6         0.021637         0.012262            1.764           0.078
L3.y7         0.199781         0.044337            4.506           0.000
========================================================================

Results for equation y5
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const        -0.007167         0.020294           -0.353           0.724
L1.y1         0.042802         0.046576            0.919           0.358
L1.y2         0.038031         0.020988            1.812           0.070
L1.y3         0.126728         0.025556            4.959           0.000
L1.y4         0.110598         0.024628            4.491           0.000
L1.y5         0.724546         0.022736           31.868           0.000
L1.y6        -0.057504         0.018269           -3.148           0.002
L1.y7        -0.166633         0.064717           -2.575           0.010
L2.y1         0.053567         0.050464            1.061           0.288
L2.y2        -0.033198         0.023547           -1.410           0.159
L2.y3        -0.009882         0.028346           -0.349           0.727
L2.y4        -0.038585         0.027361           -1.410           0.158
L2.y5         0.026103         0.026957            0.968           0.333
L2.y6        -0.051652         0.018361           -2.813           0.005
L2.y7        -0.016959         0.071968           -0.236           0.814
L3.y1         0.035306         0.045795            0.771           0.441
L3.y2        -0.031083         0.021668           -1.434           0.151
L3.y3        -0.028325         0.023368           -1.212           0.225
L3.y4        -0.013239         0.023081           -0.574           0.566
L3.y5         0.109588         0.023541            4.655           0.000
L3.y6         0.021769         0.017975            1.211           0.226
L3.y7         0.058577         0.064993            0.901           0.367
========================================================================

Results for equation y6
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const        -0.019080         0.014921           -1.279           0.201
L1.y1         0.099052         0.034245            2.892           0.004
L1.y2         0.064682         0.015432            4.191           0.000
L1.y3         0.135060         0.018790            7.188           0.000
L1.y4         0.022096         0.018107            1.220           0.222
L1.y5         0.219222         0.016716           13.114           0.000
L1.y6         0.148041         0.013433           11.021           0.000
L1.y7        -0.257301         0.047583           -5.407           0.000
L2.y1        -0.170807         0.037104           -4.603           0.000
L2.y2        -0.091633         0.017313           -5.293           0.000
L2.y3        -0.127643         0.020841           -6.124           0.000
L2.y4        -0.038244         0.020117           -1.901           0.057
L2.y5        -0.070009         0.019820           -3.532           0.000
L2.y6         0.093356         0.013500            6.915           0.000
L2.y7         0.284543         0.052914            5.377           0.000
L3.y1        -0.181470         0.033671           -5.390           0.000
L3.y2        -0.054435         0.015932           -3.417           0.001
L3.y3        -0.056016         0.017181           -3.260           0.001
L3.y4        -0.027106         0.016970           -1.597           0.110
L3.y5         0.024112         0.017309            1.393           0.164
L3.y6         0.059068         0.013216            4.469           0.000
L3.y7         0.304197         0.047786            6.366           0.000
========================================================================

Results for equation y7
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.304923         0.024622           12.384           0.000
L1.y1        -0.279235         0.056509           -4.941           0.000
L1.y2        -0.181726         0.025464           -7.137           0.000
L1.y3        -0.211371         0.031006           -6.817           0.000
L1.y4        -0.025242         0.029880           -0.845           0.398
L1.y5        -0.143211         0.027584           -5.192           0.000
L1.y6         0.061639         0.022166            2.781           0.005
L1.y7         1.066377         0.078519           13.581           0.000
L2.y1        -0.092224         0.061226           -1.506           0.132
L2.y2         0.028722         0.028569            1.005           0.315
L2.y3         0.011346         0.034391            0.330           0.741
L2.y4        -0.011722         0.033196           -0.353           0.724
L2.y5         0.043393         0.032706            1.327           0.185
L2.y6         0.101063         0.022276            4.537           0.000
L2.y7         0.106787         0.087316            1.223           0.221
L3.y1        -0.166293         0.055561           -2.993           0.003
L3.y2        -0.039785         0.026289           -1.513           0.130
L3.y3        -0.039566         0.028351           -1.396           0.163
L3.y4         0.002645         0.028003            0.094           0.925
L3.y5         0.068127         0.028562            2.385           0.017
L3.y6        -0.007268         0.021809           -0.333           0.739
L3.y7         0.281473         0.078854            3.570           0.000
========================================================================

Correlation matrix of residuals
            y1        y2        y3        y4        y5        y6        y7
y1    1.000000  0.079552  0.498897 -0.365631 -0.753555 -0.208583  0.952331
y2    0.079552  1.000000 -0.238870  0.043716 -0.025636 -0.055213  0.264704
y3    0.498897 -0.238870  1.000000 -0.015786 -0.439831 -0.141851  0.529649
y4   -0.365631  0.043716 -0.015786  1.000000  0.373942  0.030056 -0.236243
y5   -0.753555 -0.025636 -0.439831  0.373942  1.000000  0.217675 -0.763868
y6   -0.208583 -0.055213 -0.141851  0.030056  0.217675  1.000000 -0.215717
y7    0.952331  0.264704  0.529649 -0.236243 -0.763868 -0.215717  1.000000



In [ ]:
fc = model_fit.forecast(y=test.values, steps=3) # desnormalizar!!! prevendo 3 passos à frente para todas as variáveis.
# print(f'{fc}')
dfc = denormalize(fc, min_raw, max_raw)
# for i in range(len(dfc)):
#     print(f'{dfc[i][6]}')
In [ ]:
# Aplicando a previsão aos dados, usando rolling forecasting.

order = 1

history = [x for x in train]
predictions = []

n_prints = 3
max_prints = len(test) - 3

for t in range(len(test)):
	model = VAR(train.values)
	model_fit = model.fit(maxlags=order)
	fc = model_fit.forecast(y=test[:t+1].values, steps=1)
	output = fc[0][6]
	yhat = output
	predictions.append(yhat)
	obs = test['ETo'].iloc[t]
	history.append(obs)
	if t < 3 or t >= (len(test) - 3):
		print('previsão=%f, osbervado=%f' % (yhat, obs))
previsão=0.672375, osbervado=0.716667
previsão=0.437767, osbervado=0.433333
previsão=0.349767, osbervado=0.316667
...
previsão=0.639124, osbervado=0.683333
previsão=0.708566, osbervado=0.766667
previsão=0.768693, osbervado=0.850000
In [ ]:
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE

mae_var_int = mean_absolute_error(test['ETo'], predictions)
print('Test MAE: %.3f' % mae_var_int)

rmse_var_int = sqrt(mean_squared_error(test['ETo'], predictions))
print('Test RMSE: %.3f' % rmse_var_int)

mape_var_int = mean_absolute_percentage_error(test['ETo'], predictions)
print('Test MAPE: %.3f' % mape_var_int)
Test MAE: 0.042
Test RMSE: 0.052
Test MAPE: 0.124
In [ ]:
fig, ax = plt.subplots()

ax.plot(df_target['ETo'].index[size:len(df_target)], test['ETo'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')

ax.legend(loc='best')

ax.set_xlabel('Year')
ax.set_ylabel('Value')

# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza

plt.title("ETo de Pará de Minas/MG")

fig.autofmt_xdate()
plt.tight_layout()
No description has been provided for this image
In [ ]:
# Ilustrando graficamente o desempenho do modelo ARIMA.

fig, ax = plt.subplots()

ax.plot(df_target['ETo'].index[0:size], train['ETo'], 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(df_target)], test['ETo'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')

ax.legend(loc='best')

ax.set_xlabel('Year')
ax.set_ylabel('Value')

# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza

plt.title("ETo de Pará de Minas/MG")

fig.autofmt_xdate()
plt.tight_layout()
No description has been provided for this image

Aqui a ideia é usar a ETo de outras bases para a previsão multivariada.¶

In [ ]:
origin = "datasets/clima/mg/"
geolocator = Nominatim(user_agent='geoapiFSAF')
In [ ]:
file_names = []

for path, subdirectory, files in os.walk(origin):
    for name in files:
        file_names.append(name.split(sep='.csv')[0])

len(file_names)
Out[ ]:
12
In [ ]:
latitudes = []
longitudes = []

for fn in file_names:
    lat, lon = return_coordinates(fn,latitudes, longitudes)
    latitudes.append(lat)
    longitudes.append(lon)
-15.35, -42.25
-15.35, -44.45
-15.35, -46.65
-17.55, -42.25
-17.55, -44.45
-17.55, -46.65
-19.75, -42.25
-19.75, -44.45
-19.75, -46.65
-19.75, -48.85
-21.95, -44.45
-21.95, -46.65
In [ ]:
all_muni = geobr.read_municipality(code_muni="MG", year=2022)
In [ ]:
# plot
fig, ax = plt.subplots(figsize=(10, 10), dpi=200)

all_muni.plot(facecolor="#d2c1af", edgecolor="#FEBF57", ax=ax)
plt.scatter(longitudes, latitudes, marker='.', color='green', label='Bases disponíveis')
plt.scatter(-44.17, -19.46, marker='*', color='red', label='Sete Lagoas')
plt.scatter(-44.45, -17.55, marker='+', color='blue', label='Bases escolhidas')
plt.scatter(-42.25, -19.75, marker='+', color='blue')
plt.scatter(-44.45, -21.95, marker='+', color='blue')
plt.scatter(-46.65, -19.75, marker='+', color='blue')
ax.set_title("Pontos para ETo's de Interesse - Minas Gerais", fontsize=20)
ax.axis("off")
plt.legend(loc='best')

plt.show()
No description has been provided for this image
In [ ]:
df1 = pd.read_csv('datasets/clima/lat-17.55_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df1.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 10.4 1.8 22.2 18.6 93.3 30.2 2.1
2000-01-02 10.7 1.4 22.5 18.3 91.8 49.0 2.2
2000-01-03 14.4 2.3 26.0 17.9 86.4 26.5 3.1
2000-01-04 13.4 1.2 25.7 18.8 85.7 16.7 2.9
2000-01-05 24.1 1.0 29.2 18.2 76.7 2.1 4.9
In [ ]:
df1.plot(title='Dados de Francisco Dumont/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df1['ETo'].plot(title='Dados de Francisco Dumont/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df2 = pd.read_csv('datasets/clima/lat-19.75_lon-42.25.csv', parse_dates=True, index_col='Unnamed: 0')
df2.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 10.5 1.4 25.4 19.2 91.0 12.1 2.3
2000-01-02 12.2 1.9 26.9 18.9 89.2 16.0 2.7
2000-01-03 17.0 1.2 28.5 18.6 81.8 16.9 3.6
2000-01-04 13.5 1.4 28.3 19.4 84.2 0.0 3.1
2000-01-05 24.2 1.0 30.1 19.6 80.2 0.0 4.9
In [ ]:
df2.plot(title='Dados de Bom Jesus do Galho/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df2['ETo'].plot(title='Dados de Bom Jesus do Galho/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df3 = pd.read_csv('datasets/clima/lat-21.95_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df3.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 10.6 1.3 22.3 16.8 92.2 32.0 2.2
2000-01-02 10.6 1.9 20.7 15.9 93.9 65.6 2.0
2000-01-03 10.8 1.3 20.0 15.1 94.6 72.8 2.1
2000-01-04 11.5 0.9 23.4 15.9 90.2 49.4 2.4
2000-01-05 19.4 1.0 27.8 15.9 80.7 22.7 3.9
In [ ]:
df3.plot(title='Dados de Carvalhos/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df3['ETo'].plot(title='Dados de Carvalhos/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df4 = pd.read_csv('datasets/clima/lat-19.75_lon-46.65.csv', parse_dates=True, index_col='Unnamed: 0')
df4.head()
Out[ ]:
Rs u2 Tmax Tmin RH pr ETo
2000-01-01 11.0 1.3 24.6 18.1 93.2 30.3 2.3
2000-01-02 10.6 2.0 24.9 18.5 90.8 29.6 2.3
2000-01-03 10.5 1.8 21.6 17.6 94.9 35.6 2.1
2000-01-04 10.7 1.6 24.2 17.6 93.9 38.1 2.2
2000-01-05 11.5 1.2 25.4 18.1 90.5 10.4 2.5
In [ ]:
df4.plot(title='Dados de Argenita/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df4['ETo'].plot(title='Dados de Argenita/MG')
plt.legend(loc='best')
plt.show()
No description has been provided for this image
In [ ]:
df_etos = pd.DataFrame()

df_etos['x1'] = df1['ETo']
df_etos['x2'] = df2['ETo']
df_etos['x3'] = df3['ETo']
df_etos['x4'] = df4['ETo']
df_etos['target'] = df_target['ETo']
df_etos.head()
Out[ ]:
x1 x2 x3 x4 target
2000-01-01 2.1 2.3 2.2 2.3 2.2
2000-01-02 2.2 2.7 2.0 2.3 2.4
2000-01-03 3.1 3.6 2.1 2.1 2.6
2000-01-04 2.9 3.1 2.4 2.2 2.8
2000-01-05 4.9 4.9 3.9 2.5 4.7
In [ ]:
df_etos.describe()
Out[ ]:
x1 x2 x3 x4 target
count 7305.000000 7305.000000 7305.000000 7305.000000 7305.000000
mean 4.062368 3.533593 3.217180 3.765270 3.895592
std 1.145331 1.228129 1.136878 1.085725 1.165992
min 1.400000 1.100000 0.800000 1.000000 1.000000
25% 3.100000 2.500000 2.300000 2.900000 2.900000
50% 3.900000 3.300000 3.100000 3.600000 3.700000
75% 4.900000 4.500000 4.000000 4.600000 4.800000
max 7.800000 7.400000 6.800000 6.800000 7.000000
In [ ]:
sns.pairplot(df_etos)
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x2a1ef940>
No description has been provided for this image
In [ ]:
fields = ['x1','x2','x3', 'x4', 'target'] # mdct is datetime 
x = df_etos[fields]
x.head(10)
Out[ ]:
x1 x2 x3 x4 target
2000-01-01 2.1 2.3 2.2 2.3 2.2
2000-01-02 2.2 2.7 2.0 2.3 2.4
2000-01-03 3.1 3.6 2.1 2.1 2.6
2000-01-04 2.9 3.1 2.4 2.2 2.8
2000-01-05 4.9 4.9 3.9 2.5 4.7
2000-01-06 4.9 5.1 4.9 3.5 5.5
2000-01-07 5.2 5.3 5.2 3.8 5.7
2000-01-08 4.9 4.3 5.4 5.3 5.5
2000-01-09 5.4 2.8 5.9 5.2 5.5
2000-01-10 5.0 4.6 5.6 5.3 5.8
In [ ]:
NON_DER = []
df_new = df_derived_by_shift(x, 4, NON_DER)
In [ ]:
df_new.head()
Out[ ]:
x1 x2 x3 x4 target x1_1 x1_2 x1_3 x1_4 x2_1 ... x3_3 x3_4 x4_1 x4_2 x4_3 x4_4 target_1 target_2 target_3 target_4
2000-01-01 2.1 2.3 2.2 2.3 2.2 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-02 2.2 2.7 2.0 2.3 2.4 2.1 NaN NaN NaN 2.3 ... NaN NaN 2.3 NaN NaN NaN 2.2 NaN NaN NaN
2000-01-03 3.1 3.6 2.1 2.1 2.6 2.2 2.1 NaN NaN 2.7 ... NaN NaN 2.3 2.3 NaN NaN 2.4 2.2 NaN NaN
2000-01-04 2.9 3.1 2.4 2.2 2.8 3.1 2.2 2.1 NaN 3.6 ... 2.2 NaN 2.1 2.3 2.3 NaN 2.6 2.4 2.2 NaN
2000-01-05 4.9 4.9 3.9 2.5 4.7 2.9 3.1 2.2 2.1 3.1 ... 2.0 2.2 2.2 2.1 2.3 2.3 2.8 2.6 2.4 2.2

5 rows × 25 columns

In [ ]:
df_new = df_new.dropna()
In [ ]:
df_new.head()
Out[ ]:
x1 x2 x3 x4 target x1_1 x1_2 x1_3 x1_4 x2_1 ... x3_3 x3_4 x4_1 x4_2 x4_3 x4_4 target_1 target_2 target_3 target_4
2000-01-05 4.9 4.9 3.9 2.5 4.7 2.9 3.1 2.2 2.1 3.1 ... 2.0 2.2 2.2 2.1 2.3 2.3 2.8 2.6 2.4 2.2
2000-01-06 4.9 5.1 4.9 3.5 5.5 4.9 2.9 3.1 2.2 4.9 ... 2.1 2.0 2.5 2.2 2.1 2.3 4.7 2.8 2.6 2.4
2000-01-07 5.2 5.3 5.2 3.8 5.7 4.9 4.9 2.9 3.1 5.1 ... 2.4 2.1 3.5 2.5 2.2 2.1 5.5 4.7 2.8 2.6
2000-01-08 4.9 4.3 5.4 5.3 5.5 5.2 4.9 4.9 2.9 5.3 ... 3.9 2.4 3.8 3.5 2.5 2.2 5.7 5.5 4.7 2.8
2000-01-09 5.4 2.8 5.9 5.2 5.5 4.9 5.2 4.9 4.9 4.3 ... 4.9 3.9 5.3 3.8 3.5 2.5 5.5 5.7 5.5 4.7

5 rows × 25 columns

In [ ]:
df_new.corr()
Out[ ]:
x1 x2 x3 x4 target x1_1 x1_2 x1_3 x1_4 x2_1 ... x3_3 x3_4 x4_1 x4_2 x4_3 x4_4 target_1 target_2 target_3 target_4
x1 1.000000 0.779660 0.572394 0.673563 0.818432 0.808354 0.688298 0.604119 0.551753 0.701051 ... 0.480812 0.440096 0.658057 0.582857 0.513447 0.461159 0.748016 0.645322 0.562314 0.507240
x2 0.779660 1.000000 0.700405 0.640360 0.811726 0.677007 0.582250 0.522083 0.488851 0.784082 ... 0.569185 0.522505 0.642308 0.565833 0.492845 0.448007 0.733317 0.630401 0.551571 0.502079
x3 0.572394 0.700405 1.000000 0.793822 0.774858 0.497210 0.440292 0.414338 0.409969 0.582797 ... 0.548417 0.515298 0.656966 0.525570 0.453505 0.416872 0.633741 0.530960 0.476539 0.452356
x4 0.673563 0.640360 0.793822 1.000000 0.844529 0.570686 0.501539 0.457320 0.437959 0.543532 ... 0.452001 0.411746 0.739909 0.569562 0.476092 0.429567 0.674316 0.544038 0.476691 0.442275
target 0.818432 0.811726 0.774858 0.844529 1.000000 0.689353 0.587744 0.526335 0.495982 0.685596 ... 0.516355 0.469410 0.739108 0.595171 0.503852 0.449736 0.788732 0.635080 0.544451 0.496099
x1_1 0.808354 0.677007 0.497210 0.570686 0.689353 1.000000 0.808304 0.688223 0.604109 0.779600 ... 0.536941 0.480677 0.673517 0.657984 0.582808 0.513344 0.818379 0.747972 0.645316 0.562307
x1_2 0.688298 0.582250 0.440292 0.501539 0.587744 0.808304 1.000000 0.808232 0.688187 0.676932 ... 0.592017 0.536820 0.570611 0.673431 0.657929 0.582708 0.689261 0.818340 0.747959 0.645293
x1_3 0.604119 0.522083 0.414338 0.457320 0.526335 0.688223 0.808232 1.000000 0.808281 0.582110 ... 0.572020 0.591912 0.501531 0.570581 0.673430 0.657888 0.587621 0.689219 0.818371 0.748004
x1_4 0.551753 0.488851 0.409969 0.437959 0.495982 0.604109 0.688187 0.808281 1.000000 0.521972 ... 0.496922 0.571991 0.457391 0.501598 0.570630 0.673463 0.526309 0.587631 0.689256 0.818402
x2_1 0.701051 0.784082 0.582797 0.543532 0.685596 0.779600 0.676932 0.582110 0.521972 1.000000 ... 0.642084 0.569084 0.640252 0.642185 0.565730 0.492713 0.811684 0.733242 0.630326 0.551481
x2_2 0.605433 0.650588 0.510284 0.474396 0.583125 0.700914 0.779500 0.676709 0.581903 0.784028 ... 0.721210 0.641944 0.543332 0.640039 0.642013 0.565527 0.685461 0.811561 0.733105 0.630159
x2_3 0.539449 0.578971 0.487843 0.441044 0.526357 0.605346 0.700850 0.779421 0.676649 0.650519 ... 0.700134 0.721150 0.474310 0.543231 0.639976 0.641930 0.583023 0.685400 0.811532 0.733065
x2_4 0.494432 0.543735 0.478477 0.420029 0.495066 0.539432 0.605326 0.700862 0.779416 0.578911 ... 0.582550 0.700130 0.441052 0.474314 0.543238 0.639982 0.526334 0.583017 0.685410 0.811534
x3_1 0.592259 0.721416 0.786957 0.666275 0.717551 0.572260 0.497059 0.440119 0.414270 0.700338 ... 0.630130 0.548277 0.793759 0.656850 0.525476 0.453345 0.774776 0.633650 0.530910 0.476484
x3_2 0.537084 0.642176 0.630271 0.527061 0.595080 0.592142 0.572138 0.496930 0.440097 0.721346 ... 0.786805 0.630022 0.666220 0.793698 0.656804 0.525356 0.717453 0.774734 0.633642 0.530896
x3_3 0.480812 0.569185 0.548417 0.452001 0.516355 0.536941 0.592017 0.572020 0.496922 0.642084 ... 1.000000 0.786741 0.526985 0.666135 0.793686 0.656713 0.594925 0.717400 0.774760 0.633651
x3_4 0.440096 0.522505 0.515298 0.411746 0.469410 0.480677 0.536820 0.591912 0.571991 0.569084 ... 0.786741 1.000000 0.451915 0.526878 0.666086 0.793629 0.516205 0.594845 0.717390 0.774751
x4_1 0.658057 0.642308 0.656966 0.739909 0.739108 0.673517 0.570611 0.501531 0.457391 0.640252 ... 0.526985 0.451915 1.000000 0.739906 0.569567 0.476063 0.844499 0.674290 0.544067 0.476741
x4_2 0.582857 0.565833 0.525570 0.569562 0.595171 0.657984 0.673431 0.570581 0.501598 0.642185 ... 0.666135 0.526878 0.739906 1.000000 0.739905 0.569519 0.739029 0.844476 0.674317 0.544113
x4_3 0.513447 0.492845 0.453505 0.476092 0.503852 0.582808 0.657929 0.673430 0.570630 0.565730 ... 0.793686 0.666086 0.569567 0.739905 1.000000 0.739898 0.595109 0.739012 0.844488 0.674347
x4_4 0.461159 0.448007 0.416872 0.429567 0.449736 0.513344 0.582708 0.657888 0.673463 0.492713 ... 0.656713 0.793629 0.476063 0.569519 0.739898 1.000000 0.503717 0.595055 0.739033 0.844517
target_1 0.748016 0.733317 0.633741 0.674316 0.788732 0.818379 0.689261 0.587621 0.526309 0.811684 ... 0.594925 0.516205 0.844499 0.739029 0.595109 0.503717 1.000000 0.788691 0.635069 0.544432
target_2 0.645322 0.630401 0.530960 0.544038 0.635080 0.747972 0.818340 0.689219 0.587631 0.733242 ... 0.717400 0.594845 0.674290 0.844476 0.739012 0.595055 0.788691 1.000000 0.788697 0.635077
target_3 0.562314 0.551571 0.476539 0.476691 0.544451 0.645316 0.747959 0.818371 0.689256 0.630326 ... 0.774760 0.717390 0.544067 0.674317 0.844488 0.739033 0.635069 0.788697 1.000000 0.788718
target_4 0.507240 0.502079 0.452356 0.442275 0.496099 0.562307 0.645293 0.748004 0.818402 0.551481 ... 0.633651 0.774751 0.476741 0.544113 0.674347 0.844517 0.544432 0.635077 0.788718 1.000000

25 rows × 25 columns

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
colormap = plt.cm.RdBu
plt.figure(figsize=(15,10))
plt.title(u'4 days', y=1.05, size=16)

mask = np.zeros_like(df_new.corr())
mask[np.triu_indices_from(mask)] = True

svm = sns.heatmap(df_new.corr(), mask=mask, linewidths=0.1, vmax=1.0, 
            square=True, cmap=colormap, linecolor='white', annot=True)
No description has been provided for this image
In [ ]:
# Normalize Data

# Save Min-Max for Denorm
min_raw = df_etos.min()

max_raw = df_etos.max()

# Perform Normalization
norm_df = normalize(df_etos)
df = df_etos
In [ ]:
# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.

# X = df_target['ETo'].values
size = int(len(norm_df) * 0.80)
train, test = norm_df[0:size], norm_df[size:len(norm_df)]

# Modelo VAR.

order = 3
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
lag_order = model_fit.k_ar

print(f'lag_order = {lag_order}\n')
print(model_fit.summary())
lag_order = 3

  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Mon, 11, Dec, 2023
Time:                     22:35:40
--------------------------------------------------------------------
No. of Equations:         5.00000    BIC:                   -24.0400
Nobs:                     5841.00    HQIC:                  -24.0996
Log likelihood:           29115.6    FPE:                3.31034e-11
AIC:                     -24.1314    Det(Omega_mle):     3.26537e-11
--------------------------------------------------------------------
Results for equation y1
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.038429         0.003985            9.643           0.000
L1.y1         0.497902         0.016742           29.740           0.000
L1.y2         0.126919         0.014987            8.469           0.000
L1.y3         0.124682         0.015239            8.182           0.000
L1.y4         0.103962         0.015938            6.523           0.000
L1.y5         0.086291         0.019306            4.470           0.000
L2.y1         0.091085         0.018063            5.043           0.000
L2.y2        -0.030933         0.015971           -1.937           0.053
L2.y3        -0.083846         0.017289           -4.850           0.000
L2.y4        -0.021265         0.016860           -1.261           0.207
L2.y5        -0.039503         0.019973           -1.978           0.048
L3.y1         0.087351         0.016260            5.372           0.000
L3.y2        -0.004889         0.014590           -0.335           0.738
L3.y3        -0.021440         0.016006           -1.340           0.180
L3.y4         0.034687         0.016087            2.156           0.031
L3.y5        -0.055626         0.019244           -2.891           0.004
========================================================================

Results for equation y2
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.017179         0.004346            3.953           0.000
L1.y1         0.148346         0.018257            8.126           0.000
L1.y2         0.423960         0.016343           25.941           0.000
L1.y3         0.346178         0.016618           20.831           0.000
L1.y4         0.046377         0.017380            2.668           0.008
L1.y5         0.014610         0.021053            0.694           0.488
L2.y1        -0.029502         0.019698           -1.498           0.134
L2.y2         0.028271         0.017416            1.623           0.105
L2.y3        -0.049274         0.018854           -2.614           0.009
L2.y4        -0.058452         0.018386           -3.179           0.001
L2.y5        -0.006885         0.021781           -0.316           0.752
L3.y1         0.034696         0.017732            1.957           0.050
L3.y2         0.088317         0.015910            5.551           0.000
L3.y3         0.009865         0.017454            0.565           0.572
L3.y4        -0.012436         0.017543           -0.709           0.478
L3.y5        -0.043133         0.020985           -2.055           0.040
========================================================================

Results for equation y3
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.060087         0.004596           13.075           0.000
L1.y1         0.046026         0.019306            2.384           0.017
L1.y2         0.004992         0.017282            0.289           0.773
L1.y3         0.672639         0.017573           38.276           0.000
L1.y4         0.147630         0.018379            8.033           0.000
L1.y5        -0.051329         0.022263           -2.306           0.021
L2.y1        -0.025661         0.020830           -1.232           0.218
L2.y2        -0.013107         0.018417           -0.712           0.477
L2.y3        -0.002259         0.019937           -0.113           0.910
L2.y4        -0.091614         0.019442           -4.712           0.000
L2.y5         0.007301         0.023032            0.317           0.751
L3.y1         0.018079         0.018751            0.964           0.335
L3.y2         0.120310         0.016824            7.151           0.000
L3.y3         0.066509         0.018457            3.603           0.000
L3.y4        -0.020918         0.018551           -1.128           0.259
L3.y5        -0.016818         0.022192           -0.758           0.449
========================================================================

Results for equation y4
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.097556         0.004904           19.892           0.000
L1.y1         0.078847         0.020603            3.827           0.000
L1.y2        -0.022295         0.018443           -1.209           0.227
L1.y3         0.275957         0.018754           14.715           0.000
L1.y4         0.494779         0.019614           25.226           0.000
L1.y5         0.025223         0.023759            1.062           0.288
L2.y1         0.035668         0.022229            1.605           0.109
L2.y2         0.004500         0.019654            0.229           0.819
L2.y3        -0.130471         0.021277           -6.132           0.000
L2.y4         0.024574         0.020748            1.184           0.236
L2.y5        -0.046558         0.024580           -1.894           0.058
L3.y1         0.045880         0.020011            2.293           0.022
L3.y2         0.046761         0.017954            2.604           0.009
L3.y3        -0.007318         0.019697           -0.372           0.710
L3.y4         0.042938         0.019797            2.169           0.030
L3.y5        -0.024864         0.023682           -1.050           0.294
========================================================================

Results for equation y5
========================================================================
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         0.072899         0.004437           16.430           0.000
L1.y1         0.155153         0.018640            8.324           0.000
L1.y2         0.074389         0.016686            4.458           0.000
L1.y3         0.295725         0.016967           17.429           0.000
L1.y4         0.204022         0.017745           11.498           0.000
L1.y5         0.301184         0.021495           14.012           0.000
L2.y1        -0.001423         0.020111           -0.071           0.944
L2.y2         0.003697         0.017782            0.208           0.835
L2.y3        -0.116327         0.019249           -6.043           0.000
L2.y4        -0.068600         0.018772           -3.654           0.000
L2.y5         0.009651         0.022238            0.434           0.664
L3.y1         0.048895         0.018104            2.701           0.007
L3.y2         0.042151         0.016244            2.595           0.009
L3.y3         0.000694         0.017820            0.039           0.969
L3.y4         0.000697         0.017911            0.039           0.969
L3.y5        -0.019193         0.021426           -0.896           0.370
========================================================================

Correlation matrix of residuals
            y1        y2        y3        y4        y5
y1    1.000000  0.499386  0.209582  0.341164  0.556666
y2    0.499386  1.000000  0.316625  0.255946  0.517417
y3    0.209582  0.316625  1.000000  0.628955  0.550458
y4    0.341164  0.255946  0.628955  1.000000  0.662607
y5    0.556666  0.517417  0.550458  0.662607  1.000000



In [ ]:
fc = model_fit.forecast(y=test.values, steps=3) # desnormalizar!!! prevendo 3 passos à frente para todas as variáveis.
# print(f'{fc}')
dfc = denormalize(fc, min_raw, max_raw)
In [ ]:
# Aplicando a previsão aos dados, usando rolling forecasting.

order = 1

history = [x for x in train]
predictions = []

for t in range(len(test)):
	model = VAR(train.values)
	model_fit = model.fit(maxlags=order)
	fc = model_fit.forecast(y=test[:t+1].values, steps=1)
	output = fc[0][4]
	yhat = output
	predictions.append(yhat)
	obs = test['target'].iloc[t]
	history.append(obs)
	if t < 3 or t >= (len(test) - 3):
		print('previsão=%f, osbervado=%f' % (yhat, obs))
previsão=0.731906, osbervado=0.716667
previsão=0.532807, osbervado=0.433333
previsão=0.368967, osbervado=0.316667
...
previsão=0.743142, osbervado=0.683333
previsão=0.764702, osbervado=0.766667
previsão=0.777312, osbervado=0.850000
In [ ]:
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE

mae_var_ext = mean_absolute_error(test['target'], predictions)
print('Test MAE: %.3f' % mae_var_ext)

rmse_var_ext = sqrt(mean_squared_error(test['target'], predictions))
print('Test RMSE: %.3f' % rmse_var_ext)

mape_var_ext = mean_absolute_percentage_error(test['target'], predictions)
print('Test MAPE: %.3f' % mape_var_ext)
Test MAE: 0.052
Test RMSE: 0.067
Test MAPE: 0.144
In [ ]:
fig, ax = plt.subplots()

ax.plot(df_target['ETo'].index[size:len(df_target)], test['target'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')

ax.legend(loc='best')

ax.set_xlabel('Year')
ax.set_ylabel('Value')

# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza

plt.title("ETo de Pará de Minas/MG")

fig.autofmt_xdate()
plt.tight_layout()
No description has been provided for this image
In [ ]:
# Ilustrando graficamente o desempenho do modelo ARIMA.

fig, ax = plt.subplots()

ax.plot(df_target['ETo'].index[0:size], train['target'], 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(df_target)], test['target'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')

ax.legend(loc='best')

ax.set_xlabel('Year')
ax.set_ylabel('Value')

# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza

ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza


plt.title("ETo de Pará de Minas/MG")

fig.autofmt_xdate()
plt.tight_layout()
No description has been provided for this image
In [ ]:
metrics_names = ['mae_ar', 'mae_var_int', 'mae_var_ext', 'rmse_ar', 'rmse_var_int', 'rmse_var_ext', 'mape_ar', 'mape_var_int', 'mape_var_ext']
metrics_values = [mae_ar, mae_var_int, mae_var_ext, rmse_ar, rmse_var_int, rmse_var_ext, mape_ar,   mape_var_int,   mape_var_ext]
colors = ['tab:red', 'tab:blue', 'tab:orange']
bar_labels = ['AR ETo Principal', 'VAR Parâmetros Meteorológicos ETo Principal', 'VAR ETos em torno da ETo Principal', '_AR ETo Principal', '_VAR Parâmetros Meteorológicos ETo Principal', '_VAR ETos em torno da ETo Principal', '_AR ETo Principal', '_VAR Parâmetros Meteorológicos ETo Principal', '_VAR ETos em torno da ETo Principal']

fig, ax = plt.subplots()

bar_container = ax.bar(metrics_names, metrics_values, color=colors, label=bar_labels)
ax.set(ylabel='Values', xlabel='Metrics', title='Assessment Results') #, ylim=(0, 100))
ax.bar_label(bar_container, fmt='{:,.2f}')
ax.legend(title='Models')
Out[ ]:
<matplotlib.legend.Legend at 0x2c067b27e50>
No description has been provided for this image

Considerações: Observa-se, em todas as métricas utilizadas, que o modelo de previsão multivariado, VAR, que utiliza as variáveis "internas" (Rs, u2, Tmax, Tmin, RH, pr) da base principal (Pará de Minas/MG) para prever a ETo teve o melhor desempenho. O modelo multivariado que utiliza as séries de ETo das outras bases selecionadas (Francisco Dumont/MG, Bom Jesus do Galho/MG, Carvalhos/MG e Argenita/MG), para prever a ETo da base principal (Pará de Minas/MG), teve desempenho ligeiramente inferior. O modelo univariado, AR, de ordem 1, foi o que apresentou pior desempenho.